OcrV1, Main, Exploration, bibRecord, 001E50

Arabic OCR : Toward a complete system

Identifieur interne : 001E50 ( Main/Exploration ); précédent : 001E49; suivant : 001E51

Arabic OCR : Toward a complete system

Auteurs : A. El-Bialy [Égypte] ; A. H. Kandil [Égypte] ; M. Hashish [Égypte] ; S. Yamany [États-Unis]

Source :

SPIE proceedings series [ 1017-2653 ] ; 2000.

RBID : Pascal:01-0025673

Descripteurs français

Pascal (Inist)
- Reconnaissance optique caractère, Arabe, Caractéristique, Méthode, Segmentation, Caractère imprimé, Exemple.

English descriptors

KwdEn :
- Arabic, Characteristic, Example, Method, Optical character recognition, Printed character, Segmentation.

Abstract

Latin and Chinese OCR systems have been studied extensively in the literature. Yet little work was performed for Arabic character recognition. This is due to the technical challenges found in the Arabic text. Due to its cursive nature, a powerful and stable text segmentation is needed. Also, features capturing the characteristics of the rich Arabic character representation are needed to build the Arabic OCR. In this paper a novel segmentation technique which is font and size independent is introduced. This technique can segment the cursive written text line even if the line suffers from small skewness. The technique is not sensitive to the location of the centerline of the text line and can segment different font sizes and type (for different character sets) occurring on the same line. Features extraction is considered one of the most important phases of the text reading system. Ideally, the features extracted from a character image should capture the essential characteristics of this character that are independent of the font type and size. In such ideal case, the classifier stores a single prototype per character. However, it is practically challenging to find such ideal set of features. In this paper, a set of features that reflect the topological aspects of Arabia characters is proposed. These proposed features integrated with a topological matching technique introduce an Arabic text reading system that is semi Omni.

Affiliations:

Links toward previous steps (curation, corpus...)

to stream PascalFrancis, to step Corpus: 000754
to stream PascalFrancis, to step Curation: 000039
to stream PascalFrancis, to step Checkpoint: 000747
to stream Main, to step Merge: 001F59
to stream Main, to step Curation: 001E50

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">Arabic OCR : Toward a complete system</title>
<author><name sortKey="El Bialy, A" sort="El Bialy, A" uniqKey="El Bialy A" first="A." last="El-Bialy">A. El-Bialy</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Systems And Biomedical Dept., Faculty of Eng., Cairo University</s1>
<s3>EGY</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Égypte</country>
<wicri:noRegion>Systems And Biomedical Dept., Faculty of Eng., Cairo University</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Kandil, A H" sort="Kandil, A H" uniqKey="Kandil A" first="A. H." last="Kandil">A. H. Kandil</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Systems And Biomedical Dept., Faculty of Eng., Cairo University</s1>
<s3>EGY</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Égypte</country>
<wicri:noRegion>Systems And Biomedical Dept., Faculty of Eng., Cairo University</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Hashish, M" sort="Hashish, M" uniqKey="Hashish M" first="M." last="Hashish">M. Hashish</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Systems And Biomedical Dept., Faculty of Eng., Cairo University</s1>
<s3>EGY</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Égypte</country>
<wicri:noRegion>Systems And Biomedical Dept., Faculty of Eng., Cairo University</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Yamany, S" sort="Yamany, S" uniqKey="Yamany S" first="S." last="Yamany">S. Yamany</name>
<affiliation wicri:level="2"><inist:fA14 i1="02"><s1>CVIP Lab, University of Louisville</s1>
<s2>Louisville, KY 40292</s2>
<s3>USA</s3>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName><region type="state">Kentucky</region>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">01-0025673</idno>
<date when="2000">2000</date>
<idno type="stanalyst">PASCAL 01-0025673 INIST</idno>
<idno type="RBID">Pascal:01-0025673</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000754</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000039</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000747</idno>
<idno type="wicri:doubleKey">1017-2653:2000:El Bialy A:arabic:ocr:toward</idno>
<idno type="wicri:Area/Main/Merge">001F59</idno>
<idno type="wicri:Area/Main/Curation">001E50</idno>
<idno type="wicri:Area/Main/Exploration">001E50</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">Arabic OCR : Toward a complete system</title>
<author><name sortKey="El Bialy, A" sort="El Bialy, A" uniqKey="El Bialy A" first="A." last="El-Bialy">A. El-Bialy</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Systems And Biomedical Dept., Faculty of Eng., Cairo University</s1>
<s3>EGY</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Égypte</country>
<wicri:noRegion>Systems And Biomedical Dept., Faculty of Eng., Cairo University</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Kandil, A H" sort="Kandil, A H" uniqKey="Kandil A" first="A. H." last="Kandil">A. H. Kandil</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Systems And Biomedical Dept., Faculty of Eng., Cairo University</s1>
<s3>EGY</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Égypte</country>
<wicri:noRegion>Systems And Biomedical Dept., Faculty of Eng., Cairo University</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Hashish, M" sort="Hashish, M" uniqKey="Hashish M" first="M." last="Hashish">M. Hashish</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Systems And Biomedical Dept., Faculty of Eng., Cairo University</s1>
<s3>EGY</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Égypte</country>
<wicri:noRegion>Systems And Biomedical Dept., Faculty of Eng., Cairo University</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Yamany, S" sort="Yamany, S" uniqKey="Yamany S" first="S." last="Yamany">S. Yamany</name>
<affiliation wicri:level="2"><inist:fA14 i1="02"><s1>CVIP Lab, University of Louisville</s1>
<s2>Louisville, KY 40292</s2>
<s3>USA</s3>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName><region type="state">Kentucky</region>
</placeName>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">SPIE proceedings series</title>
<idno type="ISSN">1017-2653</idno>
<imprint><date when="2000">2000</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">SPIE proceedings series</title>
<idno type="ISSN">1017-2653</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Arabic</term>
<term>Characteristic</term>
<term>Example</term>
<term>Method</term>
<term>Optical character recognition</term>
<term>Printed character</term>
<term>Segmentation</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Reconnaissance optique caractère</term>
<term>Arabe</term>
<term>Caractéristique</term>
<term>Méthode</term>
<term>Segmentation</term>
<term>Caractère imprimé</term>
<term>Exemple</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Latin and Chinese OCR systems have been studied extensively in the literature. Yet little work was performed for Arabic character recognition. This is due to the technical challenges found in the Arabic text. Due to its cursive nature, a powerful and stable text segmentation is needed. Also, features capturing the characteristics of the rich Arabic character representation are needed to build the Arabic OCR. In this paper a novel segmentation technique which is font and size independent is introduced. This technique can segment the cursive written text line even if the line suffers from small skewness. The technique is not sensitive to the location of the centerline of the text line and can segment different font sizes and type (for different character sets) occurring on the same line. Features extraction is considered one of the most important phases of the text reading system. Ideally, the features extracted from a character image should capture the essential characteristics of this character that are independent of the font type and size. In such ideal case, the classifier stores a single prototype per character. However, it is practically challenging to find such ideal set of features. In this paper, a set of features that reflect the topological aspects of Arabia characters is proposed. These proposed features integrated with a topological matching technique introduce an Arabic text reading system that is semi Omni.</div>
</front>
</TEI>
<affiliations><list><country><li>Égypte</li>
<li>États-Unis</li>
</country>
<region><li>Kentucky</li>
</region>
</list>
<tree><country name="Égypte"><noRegion><name sortKey="El Bialy, A" sort="El Bialy, A" uniqKey="El Bialy A" first="A." last="El-Bialy">A. El-Bialy</name>
</noRegion>
<name sortKey="Hashish, M" sort="Hashish, M" uniqKey="Hashish M" first="M." last="Hashish">M. Hashish</name>
<name sortKey="Kandil, A H" sort="Kandil, A H" uniqKey="Kandil A" first="A. H." last="Kandil">A. H. Kandil</name>
</country>
<country name="États-Unis"><region name="Kentucky"><name sortKey="Yamany, S" sort="Yamany, S" uniqKey="Yamany S" first="S." last="Yamany">S. Yamany</name>
</region>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001E50 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001E50 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:01-0025673
   |texte=   Arabic OCR : Toward a complete system
}}

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024

	Serveur d'exploration sur l'OCR
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur l'OCR

Arabic OCR : Toward a complete system

Arabic OCR : Toward a complete system

Source :

Descripteurs français

English descriptors

Abstract

Links toward previous steps (curation, corpus...)

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri